IBM Research TRECVID-2009 Video Retrieval System

نویسندگان

Apostol Natsev

Shenghua Bao

Matthew Hill

Michele Merler

John R. Smith

Lexing Xie

Dong Wang

Rong Yan

Yi Zhang

چکیده

In this paper, we describe the IBM Research system for indexing, analysis, and retrieval of video as applied to the TRECVID-2009 video retrieval benchmark. A. High-Level Concept detection: This year, focus of the system improvement was on global and local feature combination, automatic training data construction from web domain, and large-scale detection using Hadoop. 1. A ibm.Global 6: Baseline runs using 98 types of global features and 3 SVM learning methods; 2. A ibm.Combine2 5: Fusion of the 2 best models from 5 candidate models on global / local features; 3. A ibm.CombineMore 4: Fusion of all 5 candidate models on global / local features; 4. A ibm.Single+08 3: Single best model from the 5 candidate models, plus the old models from 2008; 5. C ibm.Combine2+FlkBox 2: Combine A ibm.Combine2 5 with automatic extracted training data from Flickr; 6. A ibm.BOR 1: Best overall runs by compiling the best models based on heldout performance for each concept. Overall, almost all the individual components can improve the mean average precision after fused with the baseline results. To summarize, we have the following observations from our evaluation results: 1) The global and ∗IBM T. J. Watson Research Center, Hawthorne, NY, USA †IBM China Research Lab, Beijing, China ‡Dept. of Computer Science, Columbia University §Machine Learning Dept., Carnegie Mellon Univ. local features are complementary to each other, and their fusion results outperform either individual types of features; 2) The more features are combined, the better the performance, even with simple combination rules; 3) The development data collected automatically extracted from the web domain are shown to be useful on a number of the concepts, although its average performance is not comparable with manually selected training data, partially because of the large domain gap between web images and documentary video; B. Copy Detection: 1. ibm.v.balanced.meanBAL: Video-only submission produced by fusing 2 types fingerprints using the mean score in each constituent as a weighting factor. 2. ibm.v.balanced.medianBAL: As above, but using the median scores as weighting factors. 3. ibm.v.nofa.meanNOFA: Similar to the first run, but with internal weights for our temporal method tuned more conservatively and a higher score threshold applied to our color feature based method. 4. ibm.v.nofa.medianNOFA: Similar to the meanNOFA run, but using the median scores for weighting. 5. ibm.m.balanced.meanFuse: For A+V runs, we used the same 2 video only methods, plus another video method and a temporal audio method. In this run, we used the mean scores of each constituent for weighting. 6. ibm.m.balanced.medianFuse: As in the above run, but using median score for weighting. 7. ibm.m.nofa.meanFuse: As with the video-only runs, we adjusted internal parameters of the temporal methods and the thresholds for the other methods. 8. ibm.m.nofa.medianFuse: As in the m.nofa.meanFuse run, but using the median scores for weighting. Among the runs, we found that using the median score for fusing results from different methods to be superior. There were only isolated instances in which the mean method outperformed the median. We believe this is true since the median is less sensitive to outliers, and in our system, the temporal methods produced more extreme values as scores relative to the others. We found that the audio feature was an important factor in task, although we concentrated our efforts on the video methods. This suggests that more research on audio matching could deliver more gains in our runs. With our SIFTogram technique we did not preprocess the input as thoroughly as we did with the color based method, and we believe that would give a further, incremental boost. Overall, in looking at the results, we were surprised at the difficulty in choosing the threshold for the actual NDCR metric. We also note that the “balanced” profile admits very few false alarms, and suggest that a more truly balanced profile be included in the future. Keywords—Multimedia indexing, content-based Retrieval, Support Vector Machines, Copy Detection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IBM Research TRECVID-2007 Video Retrieval System

In this paper, we describe the IBM Research system for indexing, analysis, and retrieval of video as applied to the TREC-2007 video retrieval benchmark. This year, focus of the system improvement was on cross-domain learning, automation, scalability, and interactive search. Keywords—Multimedia indexing, content-based retrieval, Support Vector Machines, Model Vectors, Model-

متن کامل

IBM Research TRECVID-2006 Video Retrieval System

In this paper, we describe the IBM Research system for indexing, analysis, and retrieval of video as applied to the TREC-2006 video retrieval benchmark. This year, focus of the system improvement was on ensemble learning and fusion for both high-level feature detection task and the

متن کامل

IBM Research TRECVID-2010 Video Copy Detection and Multimedia Event Detection System

In this paper, we describe the system jointly developed by IBM Research and Columbia University for video copy detection and multimedia event detection applied to the TRECVID-2010 video retrieval benchmark. A. Content-Based Copy Detection: The focus of our copy detection system this year was fusing three types of complementary fingerprints: a keyframe-based color correlogram, SIFTogram (bag of ...

متن کامل

TRECVID 2009 – Goals , Tasks , Data , Evaluation Mechanisms and Metrics

The TREC Video Retrieval Evaluation (TRECVID) 2009 was a TREC-style video analysis and retrieval evaluation, the goal of which was to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last 9 years TRECVID has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their...

متن کامل

The Scholarly Impact of TRECVid ( 2003 - 2009 ) ( pre - print )

This paper reports on an investigation into the scholarly impact of the TRECVid (TREC Video Retrieval Evaluation) benchmarking conferences between 2003 and 2009. The contribution of TRECVid to research in video retrieval is assessed by analyzing publication content to show the development of techniques and approaches over time and by analyzing publication impact through publication numbers and ...

متن کامل